-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-3891][SQL] Add array support to percentile, percentile_approx and constant inspectors support #2802
Conversation
Can one of the admins verify this patch? |
Can one of the admins verify this patch? |
test this please |
Test build #447 has started for PR 2802 at commit
|
Test build #447 has finished for PR 2802 at commit
|
f425daf
to
2a59843
Compare
Reworked on this over the changes of #2762. Updated the pull request comment. |
test this please |
Test build #22810 has started for PR 2802 at commit
|
Test build #22810 has finished for PR 2802 at commit
|
Test FAILed. |
fixed test, please re-trigger test |
retest this please |
Test build #22860 has started for PR 2802 at commit
|
Test build #22860 has finished for PR 2802 at commit
|
Test PASSed. |
|
||
private val function = resolver.getEvaluator(exprs.map(_.dataType.toTypeInfo).toArray) | ||
|
||
private val inspectors = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we have to distinguish the isUDAFBridgeRequired
? Can we just use the exprs.map(toInspector).toArray
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
previously flag added to not disturb UDAFBridge path. Fixed and tested the same.
@chenghao-intel , Thanks for review, fixed and replied on comments. Please check the same. |
ok to test |
Test build #22897 has started for PR 2802 at commit
|
Test build #22897 has finished for PR 2802 at commit
|
Test PASSed. |
private val inspectors = exprs.map(toInspector).toArray | ||
|
||
private val function = { | ||
val parameterInfo = new SimpleGenericUDAFParameterInfo(inspectors,false,false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: spaces after ,
.
Fixed HiveUdaf wrap object issue.
- fixes failure due to return iterator and value type mismatch
e287693
to
a18f917
Compare
Test build #23759 has started for PR 2802 at commit
|
Test build #23759 has finished for PR 2802 at commit
|
Test PASSed. |
@marmbrus Please review and merge the same, thanks |
@@ -172,6 +177,8 @@ private[hive] case class HiveGenericUdf(functionClassName: String, children: Seq | |||
|
|||
override def eval(input: Row): Any = { | |||
returnInspector // Make sure initialized. | |||
if(foldable) return constantReturnValue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is probably unnecessary, as constant folding rule in Optimizer
will replace the foldable
expression with Literal
. Please correct me if there is exceptional case. (and above property constantreturnValue
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Constant check and returning value is required for two reasons:
- some UDF functions returns constant iterator when initializeAndFoldConstants called with constant iterators, by executing them once.
But if the same are called with "function.evaluate" they will not return the same constant value type. There will be mismatch in the datatype expected by constantReturnInspector datatype vs datatype returned by function.evaluate.(Ex: org.apache.hadoop.io.Text vs String). This fails unwrap.
So if return Inspector is constant we don't need to call "function.evaluate" as the expression is already evaluated and return value is already present in constant iterator.
This I have uncovered, when I made CreateArrayExression as foldable, then test fails. So modified as part of current defect fix only. - Even if Literal is identified during optimization, expression is evaluated twice, once during return Inspector creation and next during eval function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation, I guess there is probably a bug in master as you described, can you paste a query to reproduce the failure? (Text V.S. String).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In HiveQuerySuite, "constant array" testcase was failing
SELECT sort_array(
sort_array(
array("hadoop distributed file system",
"enterprise databases", "hadoop map-reduce")))
FROM src LIMIT 1
[info] - constant array *** FAILED *** (596 milliseconds)
[info] Failed to execute query using catalyst:
[info] Error: java.lang.String cannot be cast to org.apache.hadoop.io.Text
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chenghao-intel, I think you understand this code better than I do. Are you satisfied with the explanation? Does this approach seem reasonable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this failure is due to the bug of nested constant Expression <-> ObjectInspector in HiveInspectors
, and I will fix that in #3429.
@gvramana , how about revert the changes in CreateArray
and HiveGenericUDF
? I think we can merge the others first. And you can create a new PR for CreateArray.foldable
which depends on #3429, since it currently doesn't break anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chenghao-intel This PR cannot be separately merged without CreateArray, as percentile_approx accepts only constant array iterator and fails otherwise. I think we can go ahead and merge all these changes as they don't break build or tests, and are not directly dependent on #3429 in order of merge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code looks good to me in general, except the unnecessary change in the class |
That change is required, please find the updated comment above. Thanks |
@marmbrus, @chenghao-intel any other comments? Can you merge the same, thanks |
@@ -165,6 +165,11 @@ private[hive] case class HiveGenericUdf(functionClassName: String, children: Seq | |||
isUDFDeterministic && returnInspector.isInstanceOf[ConstantObjectInspector] | |||
|
|||
@transient | |||
protected lazy val constantReturnValue = unwrap( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this just be a def
? It will only be called once correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, fixed the same.
Test build #24246 has started for PR 2802 at commit
|
Test build #24246 has finished for PR 2802 at commit
|
Test PASSed. |
Thanks! Merged to master. |
Since #3429 has been merged, the bug of wrapping to Writable for HiveGenericUDF is resolved, we can safely remove the foldable checking in `HiveGenericUdf.eval`, which discussed in #2802. Author: Cheng Hao <[email protected]> Closes #3745 from chenghao-intel/generic_udf and squashes the following commits: 622ad03 [Cheng Hao] Remove the unnecessary code change in Generic UDF
…and constant inspectors support Supported passing array to percentile and percentile_approx UDAFs To support percentile_approx, constant inspectors are supported for GenericUDAF Constant folding support added to CreateArray expression Avoided constant udf expression re-evaluation Author: Venkata Ramana G <ramana.gollamudihuawei.com> Author: Venkata Ramana Gollamudi <[email protected]> Closes apache#2802 from gvramana/percentile_array_support and squashes the following commits: a0182e5 [Venkata Ramana Gollamudi] fixed review comment a18f917 [Venkata Ramana Gollamudi] avoid constant udf expression re-evaluation - fixes failure due to return iterator and value type mismatch c46db0f [Venkata Ramana Gollamudi] Removed TestHive reset 4d39105 [Venkata Ramana Gollamudi] Unified inspector creation, style check fixes f37fd69 [Venkata Ramana Gollamudi] Fixed review comments 47f6365 [Venkata Ramana Gollamudi] fixed test cb7c61e [Venkata Ramana Gollamudi] Supported ConstantInspector for UDAF Fixed HiveUdaf wrap object issue. 7f94aff [Venkata Ramana Gollamudi] Added foldable support to CreateArray
Supported passing array to percentile and percentile_approx UDAFs
To support percentile_approx, constant inspectors are supported for GenericUDAF
Constant folding support added to CreateArray expression
Avoided constant udf expression re-evaluation
Author: Venkata Ramana G [email protected]